Implement lance.write_table API and test the simplest round trips#23
Implement lance.write_table API and test the simplest round trips#23
lance.write_table API and test the simplest round trips#23Conversation
add parameters
python/lance/__init__.py
Outdated
| ---------- | ||
| table : pa.Table | ||
| Apache Arrow Table | ||
| sink : str or `Path` |
| return ds.dataset(uri, format=fmt) | ||
|
|
||
|
|
||
| def write_table(table: pa.Table, destination: Union[str, Path], primary_key: str): |
There was a problem hiding this comment.
maybe add a convenience to auto generate a pk column?
There was a problem hiding this comment.
should we push that to the application / db level?
There was a problem hiding this comment.
If we want people to use it as a python library then it's probably a good idea to have it. Could be in a wrapper function or something? Should also check for uniqueness there as well.
| return ds.dataset(uri, format=fmt) | ||
|
|
||
|
|
||
| def write_table(table: pa.Table, destination: Union[str, Path], primary_key: str): |
There was a problem hiding this comment.
So this requires holding everything in memory first right? If we have a bunch of images on S3, does this mean we need to hold them all in Arrow memory to convert to lance format?
There was a problem hiding this comment.
Right, so there will be a StreamWriter which basically opens a DatasetWriter and write batch records one by one.
It is another set of interfaces tho.
Similar to parquet https://arrow.apache.org/docs/cpp/parquet.html#writing-parquet-files
lance.write_table()APIlancedata and read it backCloses #3